Regression Dicontinuity Design

Causality at the Cutoff

Bogdan G. Popescu

John Cabot University

Table of Contents

Arbitrary Cutoffs and Causal Inferences RDD Assumptions

Intro

How can we identify causal effects using observational data?

Some of the ways to conduct causal analysis with observational data, we need to run

  • differences in differences
  • regression discontinuity design

RDD

RDD are about arbitrary rules and thresholds determining access to policy programs

  • subjects are in the program if they have scores above a specific threshold
  • subject are not in the program if they have scores below a specific threshold

Or the other way around

Running or forcing variable - index that measures eligibility for the program (e.g. scores)
Cutoff / cutpoint / threshold - the number that the determines access to the program

RDD

RDD

Student ID Exam Score Scholarship Awarded? First-Year GPA
101 83 No 3.2
102 84 No 3.3
103 85 Yes 3.6
104 86 Yes 3.5
105 87 Yes 3.7
106 84.5 No 3.4
107 85.1 Yes 3.6

RDD

Hypothetical tutoring program

Students take an entrance exam

Those who score 70 or lower get a free tutor for the year

Students then take an exit exam at the end of the year

RDD

Hypothetical tutoring program

Show the code
# Fake program data
set.seed(1234)
num_students <- 1000
tutoring <- tibble(
  id = 1:num_students,
  entrance_exam = rbeta(num_students, shape1 = 7, shape2 = 2),
  exit_exam = rbeta(num_students, shape1 = 5, shape2 = 3)
) |> 
  mutate(entrance_exam = entrance_exam * 100,
         tutoring = entrance_exam <= 70) |> 
  mutate(exit_exam = exit_exam * 40 + 10 * tutoring + entrance_exam / 2) |> 
  mutate(tutoring_fuzzy = ifelse(entrance_exam > 60 & entrance_exam < 80,
                                 sample(c(TRUE, FALSE), n(), replace = TRUE),
                                 tutoring)) |> 
  mutate(tutoring_text = factor(tutoring, levels = c(FALSE, TRUE),
                                labels = c("No tutor", "Tutor")),
         tutoring_fuzzy_text = factor(tutoring_fuzzy, levels = c(FALSE, TRUE),
                                      labels = c("No tutor", "Tutor"))) |> 
  mutate(entrance_centered = entrance_exam - 70)
Show the code
ggplot(tutoring, aes(x = entrance_exam, y = tutoring_text, fill = tutoring_text)) +
  geom_vline(xintercept = 70, linewidth = 2, color = "#FFDC00") + 
  geom_point(size = 3, pch = 21, color = "white", alpha = 0.4,
             position = position_jitter(width = 0, height = 0.15, seed = 1234)) + 
  labs(x = "Entrance exam score", y = NULL) + 
  guides(fill = "none") +
  scale_fill_manual(values = c("black", "red"), name = NULL) +
  theme_bw(base_size = 28)

Intuition

Hypothetical tutoring program

The people right before and right after the threshold are very similar

The idea is that we can measure the effect of the program by examining students just around the threshold

Show the code
ggplot(tutoring, aes(x = entrance_exam, y = tutoring_text, fill = tutoring_text)) +
  geom_vline(xintercept = 70, linewidth = 2, color = "#FFDC00") + 
    annotate(geom = "rect", fill = "grey50", alpha = 0.25, ymin = -Inf, ymax = Inf,
           xmin = 70 - 5,  xmax = 70 + 5) +
    annotate(geom = "rect", fill = "grey50", alpha = 0.5, ymin = -Inf, ymax = Inf,
           xmin = 70 - 2,  xmax = 70 + 2) +
  geom_point(size = 3, pch = 21, color = "white", alpha = 0.4,
             position = position_jitter(width = 0, height = 0.15, seed = 1234)) + 
  labs(x = "Entrance exam score", y = NULL) + 
  guides(fill = "none") +
  scale_fill_manual(values = c("black", "red"), name = NULL) +
  theme_bw(base_size = 28)

Intuition

Hypothetical tutoring program

The people right before and right after the threshold are very similar

The idea is that we can measure the effect of the program by examining students just around the threshold.

Show the code
ggplot(tutoring, aes(x = entrance_exam, y = tutoring_text, fill = tutoring_text)) +
  geom_vline(xintercept = 70, linewidth = 2, color = "#FFDC00") + 
    annotate(geom = "rect", fill = "grey50", alpha = 0.25, ymin = -Inf, ymax = Inf,
           xmin = 70 - 5,  xmax = 70 + 5) +
    annotate(geom = "rect", fill = "grey50", alpha = 0.5, ymin = -Inf, ymax = Inf,
           xmin = 70 - 2,  xmax = 70 + 2) +
  geom_point(size = 3, pch = 21, color = "white", alpha = 0.4,
             position = position_jitter(width = 0, height = 0.15, seed = 1234)) + 
  labs(x = "Entrance exam score", y = NULL) + 
  guides(fill = "none") +
  scale_fill_manual(values = c("black", "red"), name = NULL) +
  theme_bw(base_size = 28)+
    coord_cartesian(xlim = c(70 - 5, 70 + 5))

Intuition

Hypothetical tutoring program

Show the code
ggplot(tutoring, aes(x = entrance_exam, y = exit_exam, fill = tutoring_text)) +
  geom_point(size = 3, pch = 21, color = "white", alpha = 0.4)+
  geom_vline(xintercept = 70, size = 2, color = "#FFDC00") + 
  labs(x = "Entrance exam score", y = "Exit exam score") + 
  #guides(fill = "none") +
    guides(fill = guide_legend(reverse = TRUE)) +
  scale_fill_manual(values = c("black", "red"), name = NULL) +
  theme_bw(base_size = 28)+
    coord_sf(xlim = c(30, 100), 
           ylim = c(40, 90))

Intuition

Hypothetical tutoring program

Show the code
ggplot(tutoring, aes(x = entrance_exam, y = exit_exam, fill = tutoring_text)) +
  geom_point(size = 3, pch = 21, color = "white", alpha = 0.4)+
  geom_vline(xintercept = 70, size = 2, color = "#FFDC00") + 
  labs(x = "Entrance exam score", y = "Exit exam score") + 
  #guides(fill = "none") +
    guides(fill = guide_legend(reverse = TRUE)) +
  geom_smooth(data = filter(tutoring, entrance_exam <= 70),
              method = "lm",
              color="red",
              #aes(color = tutoring_text),
              size = 2, show.legend = FALSE) +
  geom_smooth(data = filter(tutoring, entrance_exam > 70),
              method = "lm", 
              color="black",
              #aes(color = tutoring_text),
              size = 2, show.legend = FALSE) +
    scale_fill_manual(values = c("black", "red"), name = NULL) +
  scale_color_manual(values = c("black", "red")) +
  theme_bw(base_size = 28)+
    coord_sf(xlim = c(30, 100), 
           ylim = c(40, 90))

Intuition

Hypothetical tutoring program

Show the code
library("broom")
tutoring <- tutoring |>
  mutate(tutoring = relevel(factor(tutoring), ref = "TRUE"))  # "TRUE" = treated is reference

rdd_tutoring <- lm(exit_exam ~ entrance_centered + tutoring, data = tutoring) |> 
  tidy()

effect_control <- filter(rdd_tutoring, term == "(Intercept)")$estimate
late <- filter(rdd_tutoring, term == "tutoringFALSE")$estimate
effect_treatment <- effect_control + late


ggplot(tutoring, aes(x = entrance_exam, y = exit_exam, fill = tutoring_text)) +
  geom_point(size = 3, pch = 21, color = "white", alpha = 0.4) +
  geom_vline(xintercept = 70, size = 2, color = "#FFDC00") + 
  labs(x = "Entrance exam score", y = "Exit exam score") + 
  #guides(fill = "none") +
    guides(fill = guide_legend(reverse = TRUE)) +
  geom_smooth(data = filter(tutoring, entrance_exam <= 70),
              method = "lm",
              color="red",
              #aes(color = tutoring_text),
              size = 2, show.legend = FALSE) +
  geom_smooth(data = filter(tutoring, entrance_exam > 70),
              method = "lm", 
              color="black",
              #aes(color = tutoring_text),
              size = 2, show.legend = FALSE) +
  scale_fill_manual(values = c("black", "red"), name = NULL) +
  scale_color_manual(values = c("black", "red")) +
  theme_bw(base_size = 28)+
  annotate(geom = "segment", x = 70, xend = 70, 
           y = effect_control, yend = effect_treatment,
           size = 4, color = "red") +
  annotate(geom = "label", x = 75, y = effect_treatment - (late / 2),
           label = \n(causal\n effect)",
           size=5,
           color = "red", fill = alpha(c("white"),0.8))+
    coord_sf(xlim = c(30, 100), 
           ylim = c(40, 90))

Intuition

Hypothetical tutoring program

The \(\delta\) measures the gap in outcome for people on both sides of the cutpoint

\(\delta\) is also called LATE (local average treatment effecrs)

The size of the gap depends on how you draw the lines on each side of the cutoff

The type of lines you choose can change the estimate of \(\delta\)

  • parametric vs. non-parametric
  • bandwidths
  • kernels

Intuition

Hypothetical tutoring program

Show the code
ggplot(tutoring, aes(x = entrance_exam, y = exit_exam, fill = tutoring_text)) +
  geom_point(size = 3, pch = 21, color = "white", alpha = 0.4) +
  geom_vline(xintercept = 70, size = 2, color = "#FFDC00") + 
  labs(x = "Entrance exam score", y = "Exit exam score") + 
  #guides(fill = "none") +
    guides(fill = guide_legend(reverse = TRUE)) +
  geom_smooth(data = filter(tutoring, entrance_exam <= 70),
              method = "lm",
              color="red",
              #aes(color = tutoring_text),
              size = 2, show.legend = FALSE) +
  geom_smooth(data = filter(tutoring, entrance_exam > 70),
              method = "lm", 
              color="black",
              #aes(color = tutoring_text),
              size = 2, show.legend = FALSE) +
  scale_fill_manual(values = c("black", "red"), name = NULL) +
  scale_color_manual(values = c("black", "red")) +
  theme_bw(base_size = 28)+
  annotate(geom = "segment", x = 70, xend = 70, 
           y = effect_control, yend = effect_treatment,
           size = 4, color = "red") +
  annotate(geom = "label", x = 75, y = effect_treatment - (late / 2),
           label = \n(causal\n effect)",
           size=5,
           color = "red", fill = alpha(c("white"),0.8))+
    coord_sf(xlim = c(30, 100), 
           ylim = c(40, 90))

Intuition

Hypothetical tutoring program

Show the code
program_data <- tutoring %>%
  mutate(entrance_centered = 
           entrance_exam - 70)
model1 <- lm(exit_exam ~ 
               entrance_centered + tutoring,
             data = program_data)

x<-tidy(model1)
late<-round(x$estimate[x$term=="tutoringFALSE"],2)
const<-round(x$estimate[x$term=="(Intercept)"],2)


ggplot(tutoring, aes(x = entrance_exam, y = exit_exam, fill = tutoring_text)) +
  geom_point(size = 3, pch = 21, color = "white", alpha = 0.4) +
  geom_vline(xintercept = 70, size = 2, color = "#FFDC00") + 
  labs(x = "Entrance exam score", y = "Exit exam score") + 
  #guides(fill = "none") +
    guides(fill = guide_legend(reverse = TRUE)) +
  geom_smooth(data = filter(tutoring, entrance_exam <= 70),
              method = "lm",
              color="red",
              #aes(color = tutoring_text),
              size = 2, show.legend = FALSE) +
  geom_smooth(data = filter(tutoring, entrance_exam > 70),
              method = "lm", 
              color="black",
              #aes(color = tutoring_text),
              size = 2, show.legend = FALSE) +
  scale_fill_manual(values = c("black", "red"), name = NULL) +
  scale_color_manual(values = c("black", "red")) +
  theme_bw(base_size = 28)+
  annotate(geom = "segment", x = 70, xend = 70, 
           y = effect_control, yend = effect_treatment,
           size = 4, color = "red") +
  annotate(geom = "label", x = 75, y = const + (late / 2),
           label = paste("δ=", late),
           size=5,
           color = "red", fill = alpha(c("white"),0.8))+
    coord_sf(xlim = c(30, 100), 
           ylim = c(40, 90))

Intuition

Hypothetical tutoring program

Show the code
# 1. Fit cubic models
fit_control <- lm(exit_exam ~ poly(entrance_exam, 3, raw = TRUE),
                  data = filter(tutoring, entrance_exam < 70))

fit_treat <- lm(exit_exam ~ poly(entrance_exam, 3, raw = TRUE),
                data = filter(tutoring, entrance_exam >= 70))

# 2. Get tidy versions
tidy_control <- tidy(fit_control)
tidy_treat <- tidy(fit_treat)

cutoff <- 70
effect_control <- predict(fit_control, newdata = data.frame(entrance_exam = cutoff))
effect_treatment <- predict(fit_treat, newdata = data.frame(entrance_exam = cutoff))
delta <- round(effect_treatment-effect_control, 2)


# Create prediction data for plotting
x_vals <- tibble(entrance_exam = seq(30, 100, length.out = 300))

preds <- x_vals %>%
  mutate(
    exit_exam = ifelse(
      entrance_exam < 70,
      predict(fit_control, newdata = x_vals),
      predict(fit_treat, newdata = x_vals)
    )
  )

ggplot(tutoring, aes(x = entrance_exam, y = exit_exam, fill = tutoring_text)) +
  geom_point(size = 3, pch = 21, color = "white", alpha = 0.4) +
  geom_line(data = preds, aes(x = entrance_exam, y = exit_exam),
            inherit.aes = FALSE, color = "black", size = 1.5) +
  geom_vline(xintercept = 70, color = "#FFDC00", size = 2) +
  theme_bw(base_size = 28)+
scale_fill_manual(values = c("Tutor" = "red", "No tutor" = "black"), name = NULL,
                  guide = guide_legend(reverse = TRUE))+
  labs(x = "Entrance exam score", y = "Exit exam score")+
    annotate(geom = "label", x = 75, y = const + (late / 2),
           label = paste("δ=", delta),
           size=5,
           color = "red", fill = alpha(c("white"),0.8))+
      coord_sf(xlim = c(30, 100), 
           ylim = c(40, 90))+
      guides(fill = guide_legend(reverse = TRUE))

Intuition

Hypothetical tutoring program

Show the code
library(dplyr)

# Add sine-transformed variable
tutoring <- tutoring |>
  mutate(sin_term = sin(0.2 * entrance_exam))

# Control group: entrance_exam < 70
fit_trig_control <- nls(
  exit_exam ~ a + b * entrance_exam + c * sin(d * entrance_exam),
  data = filter(tutoring, entrance_exam < 70),
  start = list(a = 55, b = 0.4, c = 15, d = 0.4)
)

# Treatment group: entrance_exam >= 70
fit_trig_treat <- nls(
  exit_exam ~ a + b * entrance_exam + c * sin(d * entrance_exam),
  data = filter(tutoring, entrance_exam >= 70),
  start = list(a = 55, b = 0.4, c = 15, d = 0.4)
)

x_vals <- tibble(entrance_exam = seq(30, 100, length.out = 300))

preds_trig <- x_vals |>
  mutate(
    exit_exam = ifelse(
      entrance_exam < 70,
      predict(fit_trig_control, newdata = x_vals),
      predict(fit_trig_treat, newdata = x_vals)
    )
  )

cutoff <- 70

# Create a new data frame for x = 70
cutoff_df <- data.frame(entrance_exam = cutoff)

# Predicted outcome from each model
effect_control <- predict(fit_trig_control, newdata = cutoff_df)
effect_treatment <- predict(fit_trig_treat, newdata = cutoff_df)

# Compute the treatment effect (jump at cutoff)
delta <- round(effect_treatment-effect_control,2)

ggplot(tutoring, aes(x = entrance_exam, y = exit_exam, fill = tutoring_text)) +
  geom_point(size = 3, pch = 21, color = "white", alpha = 0.4) +
  geom_line(data = preds_trig, aes(x = entrance_exam, y = exit_exam),
            inherit.aes = FALSE, color = "black", size = 1.5) +
  geom_vline(xintercept = 70, color = "#FFDC00", size = 2) +
  theme_bw(base_size = 28)+
scale_fill_manual(values = c("Tutor" = "red", "No tutor" = "black"), name = NULL,
                  guide = guide_legend(reverse = TRUE))+
  labs(x = "Entrance exam score", y = "Exit exam score")+
    annotate(geom = "label", x = 75, y = const + (late / 2),
           label = paste("δ=", delta),
           size=5,
           color = "red", fill = alpha(c("white"),0.8))+
      coord_sf(xlim = c(30, 100), 
           ylim = c(40, 90))+
      guides(fill = guide_legend(reverse = TRUE))

Lines and Fit

Parametric line should fit the data pretty well

Non-parametric lines use the data to find the best line, often with windows and moving averages.

For example, one common methos is LOESS (Locally Estimated Scatterplot Smoothing)

Lines and Fit

Show the code
model_control <- lm(exit_exam ~ entrance_exam,
                    data = filter(tutoring, entrance_exam <= 70))

model_treat <- lm(exit_exam ~ entrance_exam,
                  data = filter(tutoring, entrance_exam > 70))

effect_control <- predict(model_control, newdata = data.frame(entrance_exam = 70))
effect_treatment <- predict(model_treat, newdata = data.frame(entrance_exam = 70))

late <- round(effect_treatment-effect_control,2)

ggplot(tutoring, aes(x = entrance_exam, y = exit_exam, fill = tutoring_text)) +
  geom_point(size = 3, pch = 21, color = "white", alpha = 0.4) +
  geom_vline(xintercept = 70, size = 2, color = "#FFDC00") + 
  labs(x = "Entrance exam score", y = "Exit exam score") + 
  #guides(fill = "none") +
    guides(fill = guide_legend(reverse = TRUE)) +
  geom_smooth(data = filter(tutoring, entrance_exam <= 70),
              method = "loess",
              color="red",
              #aes(color = tutoring_text),
              size = 2, show.legend = FALSE) +
  geom_smooth(data = filter(tutoring, entrance_exam > 70),
              method = "loess", 
              color="black",
              #aes(color = tutoring_text),
              size = 2, show.legend = FALSE) +
  scale_fill_manual(values = c("black", "red"), name = NULL) +
  scale_color_manual(values = c("black", "red")) +
  theme_bw(base_size = 28)+
  annotate(geom = "segment", x = 70, xend = 70, 
           y = effect_control, yend = effect_treatment,
           size = 4, color = "red") +
#    annotate(geom = "label", x = 75, y = const + (late / 2),
#           label = paste("δ=", late),
#           size=5,
#           color = "red", fill = alpha(c("white"),0.8))+
    coord_sf(xlim = c(30, 100), 
           ylim = c(40, 90))

Measuring the Gap

Sharp RD estimates using local polynomial regression.

Number of Obs.                 1000
BW type                       mserd
Kernel                   Triangular
VCE method                       NN

Number of Obs.                  238          762
Eff. Number of Obs.             121          184
Order est. (p)                    1            1
Order bias  (q)                   2            2
BW est. (h)                   7.616        7.616
BW bias (b)                  11.669       11.669
rho (h/b)                     0.653        0.653
Unique Obs.                     238          762

=============================================================================
        Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
=============================================================================
  Conventional    -9.992     1.708    -5.852     0.000   [-13.339 , -6.646]    
        Robust         -         -    -4.992     0.000   [-14.244 , -6.212]    
=============================================================================

Measuring the Gap

Show the code
library(rdrobust)
rdrobust(y = tutoring$exit_exam, x = tutoring$entrance_exam, c = 70)
Call: rdrobust

Number of Obs.                 1000
BW type                       mserd
Kernel                   Triangular
VCE method                       NN

Number of Obs.                  238          762
Eff. Number of Obs.             121          184
Order est. (p)                    1            1
Order bias  (q)                   2            2
BW est. (h)                   7.616        7.616
BW bias (b)                  11.669       11.669
rho (h/b)                     0.653        0.653
Unique Obs.                     238          762
Show the code
program_data <- tutoring %>%
  mutate(entrance_centered = 
           entrance_exam - 70)
model1 <- lm(exit_exam ~ 
               entrance_centered + tutoring,
             data = program_data)

gap<-round(x$estimate[x$term=="tutoringFALSE"],2)

x<-tidy(model1)
late<-round(x$estimate[x$term=="tutoringFALSE"],2)
const<-round(x$estimate[x$term=="(Intercept)"],2)


ggplot(tutoring, aes(x = entrance_exam, y = exit_exam, fill = tutoring_text)) +
  geom_point(size = 3, pch = 21, color = "white", alpha = 0.4) +
  geom_vline(xintercept = 70, size = 2, color = "#FFDC00") + 
  labs(x = "Entrance exam score", y = "Exit exam score") + 
  #guides(fill = "none") +
    guides(fill = guide_legend(reverse = TRUE)) +
  geom_smooth(data = filter(tutoring, entrance_exam <= 70),
              method = "lm",
              color="red",
              #aes(color = tutoring_text),
              size = 2, show.legend = FALSE) +
  geom_smooth(data = filter(tutoring, entrance_exam > 70),
              method = "lm", 
              color="black",
              #aes(color = tutoring_text),
              size = 2, show.legend = FALSE) +
  scale_fill_manual(values = c("black", "red"), name = NULL) +
  scale_color_manual(values = c("black", "red")) +
  theme_bw(base_size = 28)+
  annotate(geom = "segment", x = 70, xend = 70, 
           y = effect_control, yend = effect_treatment,
           size = 4, color = "red") +
  annotate(geom = "label", x = 75, y = const + (late / 2),
           label = paste("δ=", gap),
           size=5,
           color = "red", fill = alpha(c("white"),0.8))+
    coord_sf(xlim = c(30, 100), 
           ylim = c(40, 90))